Extraction of Representative Keywords Considering Co-occurrence in Positive Documents

نویسندگان

  • Byeong Man Kim
  • Qing Li
  • KwangHo Lee
  • Bo-Yeong Kang
چکیده

In linear text classification, user feedback is usually used to tune up the representative keywords (RK) for a certain class. Despite some algorithms (e.g. Rocchio) deal well with user positive and negative feedback to adjust the RKs, few researches have investigated how to adjust RKs only based on a small positive responses which is a popular case in the real-world application (e.g. users tend to click their interested URL). In this work, we describe a method of extracting representative keywords for a user from a small set of his positive feedback documents. Experiments on the Reuters-21578 collection illustrate that our approach is better than other two famous methods (Rocchio and Widrow-Hoff) with 24.8% and 14.5% improvement, respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extraction of user preferences from a few positive documents

In this work, we propose a new method for extracting user preferences from a few documents that might interest users. For this end, we first extract candidate terms and choose a number of terms called initial representative keywords (IRKs) from them through fuzzy inference. Then, by expanding IRKs and reweighting them using term co-occurrence similarity, the final representative keywords are ex...

متن کامل

Mining Association Rules from Unstructured Documents

This paper presents a system for discovering association rules from collections of unstructured documents called EART (Extract Association Rules from Text). The EART system treats texts only not images or figures. EART discovers association rules amongst keywords labeling the collection of textual documents. The main characteristic of EART is that the system integrates XML technology (to transf...

متن کامل

The analysis of co-citation and word co-occurrence networks of Iranian articles in the field of dentistry

Background and Aims: Dentistry is an important profession ensuring the health of body and soul, and has a special place in the scientific productions of medical disciplines. The purpose of this study was to analyze the co-citation and word co-occurrence of Iranian research papers in the field of dentistry based on indexed documents in Web of Science from 2014 to 2018. Materials and Methods:...

متن کامل

Event Schema Induction Based on Relational Co-occurrence over Multiple Documents

Event schema which comprises a set of related events and participants is of great importance with the development of information extraction (IE) and inducing event schema is prerequisite for IE and natural language generation. Event schema and slots are usually designed manually for traditional IE tasks. Methods for inducing event schemas automatically have been proposed recently. One of the fu...

متن کامل

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005